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Abstract 

The genetic basis of blood pressure often involves multiple genetic factors and their interactions with 
environmental factors. Gene-environment interaction is assumed to play an important role in determining 
individual blood pressure variability. Older people are more prone to high blood pressure than younger ones and 
the risk may not display a linear trend over the life span. However, which gene shows sensitivity to aging in its 
effect on blood pressure is not clear. In this work, we allowed the genetic effect to vary over time and propose a 
varying-coefficient model to identify potential genetic players that show nonlinear response across different age 
stages. We detected 2 novel loci, gene MIR1263 (a microRNA coding gene) on chromosome 3 and gene UNC13B 
on chromosome 9, that are nonlinearly associated with diastolic blood pressure. Further experimental validation is 
needed to confirm this finding. 



Background 

The genetic basis of a complex trait often involves multi- 
ple genetic factors functioning in a coordinated manner. 
The extent to which our genetic blueprint expresses also 
depends on the interactions between genetic and envir- 
onmental factors. Increasing evidence shows the impor- 
tance of gene-environment (G x E) interactions in 
determining the risk of a variety of diseases such as 
respiratory diseases [1], obesity [2], and psychiatric disor- 
ders [3] . For a review of G x E interaction, see the work 
of Hunter [4]. The empirical evidence underscores the 
importance of developing novel statistical approaches to 
identify major genetic players that are sensitive to envir- 
onmental stimuli and to further understand how they 
function. 

Blood pressure is a heritable trait influenced by several 
biological pathways sensitive to environmental stimuli. 
High blood pressure, or hypertension, affects more than 
1 billion people worldwide. It damages an individual's 
body in many ways over time, leading to heart disease, 
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stroke, kidney failure, and other health problems [5]. Age 
is known to be a risk factor for high blood pressure. 
Systolic blood pressure rises with age, whereas the diasto- 
lic blood pressure tends to fall. For people with preexist- 
ing high blood pressure, this age-related pattern occurs 
even if the blood pressure is well controlled with medica- 
tion [6], The reasons why blood pressure changes with 
age are still poorly understood, but are a topic of intense 
research. Thus, age should be an important predictor 
when searching for genetic players responsible for hyper- 
tension. However, few studies have considered an age- 
dependent mechanism in their analysis. 

The genetic response to age in blood pressure fits in well 
with the classical G x E interaction framework. G x E 
interaction typically refers to the manner in which geno- 
types influence phenotypes differently in different environ- 
ments [7]. From a biological point of view, G x E 
interaction can be better viewed as the genetic responses 
to environment changes or stresses [8]. Statistically, inter- 
action is considered as a departure from additivity when 
fitting a linear regression model with 1 or more product 
terms, for example, 
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Y = a 0 + ai X + fixG + p 2 XG + e = (a 0 + ai X) + (ft + fcX) G + <? (1) 

where Y is a quantitative trait (diastolic blood pressure 
in this analysis), G is the genetic variable, X is the envir- 
onmental variable (age), and is the error term. This is a 
classical linear model for G x E interaction analysis. As 
can be seen, equation (1) automatically assumes a linear 
interaction mechanism between G and X because the 
coefficient for G is a linear function in X. However, the 
contribution of the same gene to blood pressure may be 
quite different at different age levels. This nonlinear 
penetrance can be well understood by a statistical vary- 
ing-coefficient (VC) model [9]. VC models allow the 
coefficients to change smoothly and nonlinearly with 
other variables so that one can explore the dynamic fea- 
ture of a response over time with great flexibility and 
nice interpretability [10]. 

In this work, we applied VC models to detect genetic 
variants associated with diastolic blood pressure from the 
Genetic Analysis Workshop 18 (GAW18) data with 142 
unrelated individuals. We allowed the contribution of 
genetic variants to blood pressure to vary over time via 
varying coefficients. We further proposed a sequence of 
hypothesis tests to evaluate whether the effect of a genetic 
variant is sensitive to aging, and if it is, is it in a linear or 
nonlinear fashion? Using this analysis, we identified 2 
novel loci that show nonlinear effects over time to affect 
blood pressure. 

Methods 

The model 

The nonlinear VC model is defined as 

Y = m(X,G)+a (X) e (2) 

for given (X, G) and the response Y with E(e\X, G) = 0 
and Var(e—X, G) = 1; a 2 (X) = Var{Y\X, G) is the condi- 
tional variance function. The mean function is defined 
as m[X, G) = a{X) + fi(X)G, where (X) is a smoothing 
function in X- Under the VC modeling framework, the 
effect of a gene is allowed to vary as a function of envir- 
onmental factors, either linearly or nonlinearly, captured 
by the model itself. Thus, the VC model has the poten- 
tial to dissect the nonlinear penetrance of genetic var- 
iants. Here we also allow nonlinear function of X with Y 
modeled by a (X). This nonlinear term adjusts the non- 
linear effect of X when estimating the nonlinear effect 
of P{X). If we take a (X) = a 0 + a\X, equation (1) is just 
a special case of the VC model when fi(X) = f}\ + p 2 X. 

Hypothesis testing 

The following list shows all 4 mean models involved in 
our analysis. 

♦ Model 1: m{X, G) = a{X), no genetic effect at all; 



• Model 2: m[X, G) = a{X) + fiG, linear genetic effect 
without interaction; 

. Model 3: m(X, G) = a(X) + (/S 0 + P\X)G, linear 
genetic effect with interaction; and 
. Model 4: m{X,G) = a(X) + /3{X)G, nonlinear 
genetic effect. 

We first assess whether the genetic coefficients vary 
with x by formulating the following hypotheses, 

H l Q :P(X)=p for some constant p vs. H\ : p (X) ? p for any p 

Rejecting the null indicates that potential gene-age (G x 
age) interaction may exist. Otherwise, we conclude there 
is no G x age effect and we fit mean model 2 to test for 
association. Because the traditional linear interaction 
model given in equation (1) is a special case of the pro- 
posed VC model, we next test significance of a linear effect 
if the above null is rejected, by formulating the following 
hypotheses, 

| H 2 : ft (X) = p 0 + PiX for some constants p 0 , Pi 
[Hl-.p (X) ^ p 0 + frX for any 0 O , ft 

Failure to reject the null indicates that there is a linear 
G x age effect, so we fit mean model 3 to assess associa- 
tion. Otherwise, we conclude that the G x age interac- 
tion is nonlinear. We then assess the nonlinear genetic 
effect over age by formulating the hypotheses, 

H 3 0 :P(X) = 0,vsH 3 a :P(X) ^0 

The rejection of the null indicates that the genetic 
effect is sensitive to age in a nonlinear fashion. The 
sequence of hypothesis tests stated above was suggested 
by Ma et al [9] for optimal power to detect association. 

Model implementation 

We fit the varying coefficients with a B-spline technique 
for both a( ) and yf3( ) functions. The X variable was first 
transformed to make it more evenly distributed on each 
subinterval used in the B-spline smoothing technique. 
The great advantages of B-spline estimation over other 
nonparametric techniques are simple implementation 
and fast computing [9]. For each single-nucleotide poly- 
morphism (SNP), a(X) in mean model 1 was estimated 
by considering the following least square problem, 

En ( 1 2 

The estimated a(X) has the form a(x) = y^ N * p * 1 x s B s (x), 
where N is the number of interior knots, p is the degree of 

B-splines, and G p = {B s } s =i,2 N+p+i is the set of basis B- 

splines with degree p. For selecting the number of knots N 
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and the degree p of the B-splines, we used the Bayesian 
information criterion (BIC), 

argmin[ N , P )Bic(N,p) = argmin^ N:P )i og ^2 )+{N+p) i ogW /n, 

where f 2 = 1/n {Y — m (X;, G;)} 2 Then the same 

number of knots N a and degree p a were applied to esti- 
mate function a(X) when fitting mean models 2 to 4. 

For mean model 4, the coefficient functions a{x) and 
P (x) were estimated by, 



N a +p Q +l N+p+1 

y, - e fi t (*>) - £ * A G - 



Thus we have a (x) = j #tB t (x) and 

P[x) = ^ ^ P " + ^sB s (x), where Ng and p/3 are also 

selected following the above BIC criterion. 

The error term cr(X) can be assumed homogeneous 
following a normal distribution or heterogeneous 
without assuming a parametric distribution. When 
the homogeneous assumption is made, the likelihood 
ratio test can be applied to assess the significance 
of Hq. Otherwise a 2 {x) can be nonparametri- 
cally estimated using the spline approximation 

a 1 (x) 2^ v s B s (x) and defining a 2 (x) = ^ fjB s (*) 

as the spline estimate, where y s = (i>i,v 2 , ••• ,Vn+p+i) T 

E" =1 j^ 2 (Xi) - E^ P+1 VsBs (X,) j ; that is> 

where 



minimizes 



v = argmin v (e 2 — Bv) T {e 2 ■ 

6 2 = ((Yi -m(Xi,Gi)) 2 ,- 

/Bi(Xi)B 2 CXi)- 
Bi(X 2 ) B 2 (X 2 ) • 



• ,(Y„ 



m 



(X„, G„)) 2 )' 



and 



Thi 



VBi(X n ) B 2 (X n ) •• 
we have 



Bn+p+i(Xi) > 
Bn+p+i(X 2 ) 



BN+p+i(X n ) / 

v=(B T BY l B T e 2 , 



and 
Wild 



(ct 2 (X 1 ),--- ,a 2 (X„)) T = Bv = B(B t B)~ l B T e 2 . 
bootstrap can be applied to assess the significance of Hq 
[11]. 

Results 

We applied the above models to the GAW18 genome- 
wide association data. We focused our analysis on dia- 
stolic blood pressure (DBP) to identify any genetic 
players that can explain the variability of DBP triggered 
by nonlinear genetic penetrance over time. We treated 
DBP as the response Y and age as the X variable. The 
genetic variable G is coded following an additive model, 
that is, G = 1, 0, -1, corresponding to genotype AA, Aa, 
aa, respectively. In total, 142 individuals and 388,099 



SNPs were left after removing SNPs with a minor allele 
frequency less than 0.05. These SNPs are distributed on 
odd-numbered chromosomes from chromosome 1 to 
chromosome 21. 

Figure 1 shows the Manhattan plots of p values when 
assessing significance by fitting different models. The 
overall p value patterns for the 3 models are quite similar. 
Two known and 1 unknown gene show strong nonlinear 
genetic effects (indicated by small p values in columns 7 
and 8 in Table 1). A strong signal was detected in chro- 
mosome 3 containing a microRNA coding gene, 
MIR1263, and in chromosome 9 containing the gene 
UNC13B. MIR1263 may play a regulatory role. The sig- 
nals at gene UNC13B are quite consistent for the 3 mod- 
els. This gene was reported to be related with increased 
risk of nephropathy in patients with type 1 diabetes. 
Nephropathy accounts for 40% of end-stage renal disease 
and is associated with high cardiovascular morbidity and 
mortality [12]. 

Table 1 lists SNPs with p values less than 5 x 10~ 7 . 
These SNPs show strong nonlinear effects over time to 
affect DBP, especially for SNPs in chromosome 3 (indi- 
cated by small p42 and p43 in Table 1). These SNPs can 
be easily missed by fitting traditional linear models. For 
illustration purposes, Figure 2 shows the fitted mean 
DBP function and the genetic effects of 2 SNPs in genes 
MIR1263 and UNC13B. For SNP rs9863717 in gene 
MIR1263, DBP decreases after age 55 years for indivi- 
duals carrying genotype GG, whereas it increases for 
individuals carrying genotype AA. Thus, for a senior per- 
son who carries genotype AA at this locus, the chance to 
develop hypertension is higher than for others. For SNP 
rsl0972462 in gene UNC13B, large DBP variability 
among the 3 genotype groups is observed after age 50 
years, and a decreasing pattern is observed roughly after 
age 65 years. Among the 3 genotype groups, DBP is 
higher in the GG group, followed by the GA and AA 
groups. From the prevention and therapeutic point of 
view, people carrying genotype GG at rsl0972462 locus 
should pay special attention after age 50 years, and so 
should those carrying AA genotype at rs9863717 locus 
after age 65 years. 

Discussion and conclusions 

In this work, we proposed to model the genetic effect as a 
nonlinear function of age. It is clear that the classical lin- 
ear model, with or without interaction, is just a special 
case of the VC model. However, the VC model has the 
flexibility to capture potential nonlinear genetic effects 
over time. Evidence of nonlinear genetic effects has been 
reported previously. For example, Laitala et al [13] 
reported the curvilinear genetic effect on interindividual 
differences in coffee consumption over age. In a study of 
congenital scoliosis in mice [14], the authors found that 
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Figure 1 Manhattan plots of the p values for assessing significance of: A, Ho : ft = 0 by fitting mean model 2; (B) Ho : /3o = Pi = 0 

by fitting mean model 3; and (C) Ho : P{X) = 0 by fitting mean model 4. Solid red, blue, and gray lines correspond to significance levels of 
10~ 6 , 10 -6 . an d 10 5 ' respectively. 



mutations in genes HES7 and MESP2 are sensitive to differ- 
ent degrees of hypoxia, which is responsible for a nonlinear 
increase in the severity and penetrance of vertebral defects. 
Our analysis identified 2 novel loci associated with DBP 
with nonlinear genetic effects. They can be missed by the 
traditional linear interaction model. However, because statis- 
tical significance does not necessarily imply causality, further 
experimental validation is needed to confirm the finding. 

As shown in Ma et al [9], the VC model loses power 
because of high degrees of freedom in the test in cases 



where the genetic effect is not very complex, such as in 
a linear form. Thus one should assess constant or linear 
effects first, followed by fitting the corresponding model 
suggested by the results of the tests. In this analysis, we 
found that the coefficients are constant for most SNPs. 

Note that the function a(X) models the overall mean 
of DBP over time when there is no genetic effect. When 
a linear structure for a (X) (= cto + a\X) is forced, we 
observe inflated signals for testing Ho : fi (X) = 0. Thus, 
the incorporation of this nonlinear function can largely 



Table 1 List of SNPs with p value <5 x 10 7 



rs ID 


Gene name 


Chr 


p41 


P31 


p21 


P42 


p43 


rs 1086097 


MIR1263 


3 


1.9 x 10~ 7 


0.009 


0.90 


4.97 X 


0~ 8 


1.08 x 10~ 6 


rs686697 


MIR1263 


3 


3.4 X 10"' 


0.005 


0.76 


9.24 X 


0~ 8 


3.41 X 10 6 


rs483558 


unknown 


3 


4.7 X 10~ 7 


0.007 


0.76 


1.30 x 


o- 7 


3.72 X 10~ 6 


rs9863717* 


unknown 


3 


4.96 X 10~ 8 


0.009 


0.95 


1.23 x 


o- 8 


2.55 X 10~ 7 


rs1575160* 


unknown 


3 


8.7 X 10~ 8 


0.011 


0.70 


2.33 x 


0~ 8 


3.76 X 10~ 7 


rs723877 


UNC13B 


9 


4.3 X 10~ 7 


1.25 X 10~ 6 


2.37 X 10~ 5 


6.1 x 1 


0" 4 


0.02 


rs 10972462* 


UNC13B 


9 


9.5 X 10~ 8 


6.96 X 10~ 7 


1.31 X 10~ 5 


2.3 x 1 


o- 4 


0.007 



indicates significant SNPs after Bonferroni correction. p4i, P31, and p2l are p values for testing j-/^ ; /3(X) = 0 (mean model 4 vs. mean model 1), 
J-[q ; pQ = fi^ = 0 (mean model 3 vs. mean model 1), and j-f^ '. = 0 (mean model 2 vs. mean model 1), respectively; P42 an d p43 are P values for 
testing H 0 : fi(X) = y6 0 and H 0 : y6(X) = P 0 + PiX, respectively. Small values of p42 and P43 indicate nonlinear G x E effect. The small p values for 
those 3 SNPs with unknown gene names in chromosome 3 are in high linkage disequilibrium with those in gene MIR1263. 



Wang et al. BMC Proceedings 2014, 8(Suppl 1):S61 
http://www.biomedcentral.eom/1753-6561/8/S1/S61 



Page 5 of 6 




t 1 1 1 — r 

13 15 17 19 21 



O 



rn ■ i ■. ■ . 

3 1 t!-i 'ii/afei^t i^-^'isii^i &&rmJfo- 

CD 



■ ■ 



■ ■ ■ 




03 - £ 



■ . r 




Chromosome 

Figure 2 Fitted mean function and the estimated varying coefficients. Shown are effects for SNP rs9863717 in gene MIR1263 (chr3) and 
SNP rsl 0972462 in gene UNC13B (chr9). The observed data are shown in lighter color in the background. 



reduce false positives. In this analysis we coded the 
genetic variable G in an additive fashion, although other 
disease models such as dominant or recessive can also 
be assumed, while the optimal one can be selected 
based on a model selection criterion such as BIC. 
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